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Abstract 

This  paper  describes  the  method  we  use  in  the  diversity  task  of  web  track 
in  TREC  2009. 

The  problem  we  aim  to  solve  is  the  diversification  of  search  results  for 
ambiguous  web  queries.  We  present  a  model  based  on  knowledge  of  the  di¬ 
versity  of  query  subtopics  to  generate  a  diversified  ranking  for  retrieved  doc¬ 
uments.  We  expand  the  original  query  into  several  related  queries,  assuming 
that  query  expansions  expose  subtopics  of  the  original  query.  Moreover,  each 
query  expansion  is  given  a  weight  which  reflects  the  likelihood  of  the  inter¬ 
pretation  (the  fraction  of  users  who  issued  this  query  given  the  general  query 
topic).  We  issue  all  those  expanded  queries  including  the  original  query  to  a 
standard  BM25  search  engine,  then  re-rank  the  retrieved  documents  to  gen¬ 
erate  the  final  ranking.  Our  method  can  detect  possible  subtopics  of  a  given 
query  and  provide  a  reasonable  ranking  that  satisfies  both  relevancy  and  di¬ 
versity  metrics.  The  TREC  evaluations  show  our  method  is  effective  on  the 
diversity  task. 


1  Introduction 

Ambiguous  queries  are  those  having  more  than  one  interpretation.  When  rank¬ 
ing  doeuments  relevant  to  an  ambiguous  query,  a  seareh  engine  should  not  only 
eonsider  doeument  relevaney,  but  provide  doeuments  that  satisfy  different  inter¬ 
pretations  of  the  query.  A  good  ranking  for  an  ambiguous  query  should  maximize 
the  satisfaetion  of  average  users  by  eovering  a  variety  of  subtopies  (more  speeifie 
topies)  in  whieh  searehers  eould  be  interested.  In  order  to  generate  sueh  a  ranking, 
our  system  makes  use  of  prior  knowledge  on  subtopies  of  a  query  and  statistieal 
information  of  user  intent  on  these  subtopies. 
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For  most  of  the  current  web  search  engines,  there  are  two  main  problems  in  the 
diversity  task.  The  first  is  the  granularity  of  diversity  (that  is,  the  level  of  diversity). 
The  second  is  how  to  rank  the  retrieved  documents.  Previous  research  usually 
models  the  diversity  task  as  needing  to  pick  a  diverse  subset  from  all  subtopics,  but 
not  to  consider  the  ranking  order  among  these  retrieved  documents. 

In  this  paper,  we  introduce  a  model  which  is  based  on  understanding  the  diver¬ 
sity  of  subtopics  to  achieve  an  optimal  ranking  for  retrieved  documents.  Our  model 
tries  to  optimize  the  whole  retrieval  system.  It  issues  multiple  queries  to  obtain  di¬ 
verse  results.  It  does  not  only  try  to  return  a  set  of  documents  which  covers  all  the 
subtopics  of  a  given  query,  but  also  uses  statistical  information  of  users’  intention 
on  different  subtopics  to  rerank  the  retrieved  documents. 

2  Related  Work 

There  have  been  some  previous  results  on  diversity  task.  Carbonell  and  Goldstein 
[2]  firstly  introduced  a  preliminary  model  for  diversity  based  reranking-maximal 
marginal  relevance  (MMR).  Their  model  does  not  only  focus  on  the  relevance  of 
the  documents  but  also  maximizes  dissimilarity  among  the  retrieved  documents. 
Vee  et  al.  [11]  formalized  diversity  in  the  structured  search.  They  proposed  algo¬ 
rithms  to  return  diversified  resulfs  by  using  B-i-  free.  However,  fhe  algorifhms  are 
nof  suifable  for  unsfrucfured  search  such  as  web  search.  In  recommendafion,  di¬ 
versify  also  is  a  problem  which  is  very  similar  fo  fhe  refrieval  sysfem.  Ziegler  el 
al.  [14]  improved  recommendafion  sysfem  Ihrough  involving  fopic  diversificalion. 
They  reviewed  fhe  main  mefrics  of  evaluation  in  recommendafion  syslems  and  de¬ 
signed  Iheir  mefrics  which  involve  diversification.  For  each  producl  candidate, 
Iheir  algorilhm  measures  dissimilarity  befween  fhe  item  and  fhe  resf  of  producl 
candidales  and  combines  Ibis  dissimilarity  lo  fhe  original  relevance  order.  Yu  el 
al.  [12]  inlroduced  an  explanation-based  diversify  algorilhm  for  recommendafion 
systems  and  formalized  fhe  diversificalion  problem  as  a  compromise  befween  ac¬ 
curacy  and  diversity.  The  algorilhm  fries  lo  maximize  fhe  diversify  under  relevance 
conslrainls.  Diversify  lask  is  also  closely  relaled  lo  fhe  redundancy  deleclion.  Chen 
el  al.  [3]  considered  lhal  users  are  satisfied  wilh  some  limiled  number  of  relevanl 
documenls,  ralher  lhan  needing  all  relevanl  documenls.  Their  basic  idea  is  lhal 
documenls  should  be  selecled  sequenlially  according  lo  fhe  probabilily  of  fhe  doc- 
umenl  being  relevanl  conditioned  on  fhe  documenls  lhal  come  before. 

Researchers  also  have  gained  some  resulfs  on  fhe  query  undersfanding  and  used 
if  fo  improve  fhe  search  resull.  Song  el  al.  [10]  sludied  ambiguous  queries.  Their 
experimenls  showed  lhal  16%  of  fhe  queries  are  ambiguous  queries.  Radlinski 
and  Dumais  [8]  used  fhe  query  log  lo  exlracl  some  related  queries  lo  a  given  query. 
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Using  these  related  queries,  they  simply  diversify  and  rerank  the  results.  It  is  shown 
that  their  method  is  promising  to  improve  the  performanee  of  a  seareh  engine.  Hu 
et  al.  [7]  design  a  method  to  prediet  user’s  query  intent  using  Wikipedia.  They 
predefine  3  domains:  namely,  personal  name,  travel  and  job.  For  eaeh  domain,  they 
use  random  walk  on  the  graph  ereated  from  Wikipedia  to  generate  a  probability 
veetor.  Then  when  a  query  eomes  in,  it  will  map  to  a  set  of  artieles  in  Wikipedia 
and  then  add  the  probability  together;  if  the  summation  exeeeds  a  threshold,  the 
query  will  be  eonsidered  as  the  intent  of  that  domain. 

Reeently,  Gollapudi  and  Sharma  [5]  present  an  approaeh  to  eharaeterizing  di- 
versifieation  system  using  a  set  of  natural  axioms.  The  ehoiee  of  axioms  presents 
a  method  that  ean  make  objeetive  funetions  independent  with  the  distanee  and  rel- 
evanee  funetions.  Agrawal  et  al.  [1]  formulate  the  problem  of  diversifying  seareh 
formulation  theoretieally.  The  idea  is  to  assume  that  users  only  eonsider  the  top 
k  returned  results.  The  objeetive  is  to  maximize  the  probability  that  average  users 
find  af  leasf  one  useful  result  within  the  top  k  results.  They  prove  this  problem  is  a 
NP-hard  problem  and  also  propose  a  greedy  algorithm. 

For  the  evaluation  on  diversity  task,  Zhai  et  al.  [13]  proposed  a  framework 
for  evaluating  subtopie  retrieval  whieh  generalizes  the  traditional  preeision  and 
reeall  metries  by  aeeounting  for  intrinsie  topie  diffieulty  as  well  as  redundaney. 
They  also  present  two  ways  to  measure  the  novelty  and  then  eombine  them  with 
relevanee  under  the  strategy  of  MMR.  Clarke  et  al.  [4]  also  propose  a  evaluation 
framework:  alpha-nDCG.  They  improve  the  famous  evaluation  method  nDCG  by 
adding  diversity  and  novelety  under  probability  ranking  prineiple  (PRP). 

3  Method 

Unlike  most  previous  methods,  our  method  direetly  ineorporates  user  intent  data 
into  the  ranking  model.  In  speeifie,  we  diseover  subtopies  of  a  given  query  from 
frequently  issued  similar  queries  and  human  knowledge  eolleetions;  and  then  esti¬ 
mate  user’s  interest  in  eaeh  subtopie  using  previous  query  frequeneies. 

Figure  1  shows  an  example  to  illustrate  how  our  algorithm  works.  In  this 
example,  there  are  three  subtopies  under  the  given  query  ci,  C2,  and  C3.  Under 
these  subtopies,  there  are  web  pages  pi  through  ps  that  are  relevant  to  at  least  one 
subtopies.  Although  p^  is  more  relevant  to  one  of  the  subtopies  C2  than  ps  to  C3, 
given  that  C2  attraets  less  user  interest  than  C3,  p3  should  still  be  ranked  lower  than 
P5- 

We  formalize  our  method  below.  Given  a  query  q,  the  probability  that  a  re- 
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/\  —Pages 


Figure  1:  An  example  of  user  intent  sensitive  ranking. 


trieved  document  d  meets  user’s  intent  can  be  written  as 


P{I\q,d)  = 


P[I  =  l\q)P{d\I=l,q) 
P{d\q) 


(1) 


Since  every  document  d  for  a  given  query  q  is  retrieved  by  a  basic  retrieval  method, 
we  assume  that,  without  considering  user’s  intent,  the  probability  of  each  retrieved 
document  given  the  query  P{d\q)  is  the  same  accross  all  retrieved  documents. 
Thus,  we  ignore  P{I  =  l|q)  and  P{d\q)  in  the  following,  which  leads  us  to 


P{d\I  =  l,q)<xP{I\q,d) 

Now  we  take  subtopic  information  into  consideration,  where  a  (i 
represents  a  subtopic  associated  with  query  q. 


(2) 

l..k) 


P{d\I=l,q)  =  P{d\ci,I=l,q)xP{ci\I=l,q) 

+P{d\c2,I  =  l,q)  X  P{c2\I  =  l,q) 

+  ... 

+P{d\ck,  I  =l,q)  X  P{ck\I  =  1,  q) 

=  Y.P(d\c,I=l,q)xP{c\I=l,q) 

C 

Thus,  we  have 

P{d\I  =l,q)(xJ2  Pid\c,  I  =  l,q)x  P{c\I  =  1,  (?)  (3) 
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From  1  and  3,  we  have 

P{I\q,  d)  oc  ^  P{d\c,  I=l,q)x  P{c\I  =  1,  q)  (4) 

C 

We  generate  the  final  ranking  using  the  above  eomputed  P{I\q,d)  value. 
Therefore,  in  order  to  rank  the  retrieved  doeuments,  we  need  to  diseover  the  set 
of  subtopies  c*,  and  estimate  P{d\c,  I  =  \,q)  and  P{c\I  =  1,  q)  for  eaeh  Cj.  We 
will  deseribe  them  in  the  following  seetion. 

4  Experiments 

In  the  previous  seetion,  we  transformed  the  problem  of  ranking  retrieved  doeu¬ 
ments  into  estimating  two  probabilities  P{d\c,I  =  l,g)  and  P{c\I  =  l,q)  for 
eaeh  subtopie  Cj  of  a  given  query.  Here,  we  deseribe  our  method  to  estimate  them. 

4.1  Estimate  P(c I /  =  l,g) 

We  used  Google  Insights  for  Seareh  [6]  and  Wikipedia  to  extraet  the  possible 
subtopies.  From  Google  Insights  for  Seareh,  we  ean  obtain  a  set  of  related 
queries  as  well  as  their  relative  weights.  Sueh  queries  ean  be  eonsidered  eandi- 
date  subtopies.  Figure  2  shows  the  information  we  ean  get  for  the  query  “map” 
from  Google  Insights  for  Seareh.  The  relative  weight  for  eaeh  related  queries  gives 
us  a  eonvenient  way  to  estimate  P{c\I  =  l,q).  However,  as  we  ean  see  from  the 
example,  some  of  the  related  queries  eannot  be  seen  as  reasonable  subtopies. 

We  also  obtain  the  list  of  artieles  from  Wikipedia  disambiguation  pages  for 
a  given  query,  and  use  these  artieles  as  eandidate  subtopies.  Due  to  its  human 
maintained  nature,  the  list  of  subtopies  obtained  from  Wikipedia  is  more  reasonable 
than  the  list  from  Google.  However,  the  laek  of  relative  weight  as  present  in  Google 
introduees  a  ehallenge  to  estimate  P{c\I  =  l,q). 

We  also  use  Wikipedia  to  try  to  refine  fhese  relafed  queries.  Wikipedia  is  a 
knowledge  pool  whieh  is  mainfained  manually.  The  lisf  of  arfiele  fifles  in  a  disam¬ 
biguation  page  ean  be  freafed  as  subfopies  of  fhe  fifle  for  Ibis  disambiguation  page. 
We  use  disambiguafion  pages  fo  find  fhe  relafed  queries  whieh  are  more  reasonable 
fhan  fhose  from  Google  Insighf  for  Seareh.  However,  fhere  are  no  relative  weighfs 
on  fhe  relafed  queries  from  Wikipedia.  Then  we  merge  fhe  sefs  of  relafed  queries. 

For  a  given  query  q,  we  denofe  fhe  sef  of  eandidafe  subfopies  obfained  from 
Google  as  Qg{q)',  fhe  subfopies  from  Wikipedia,  Qw{q)-  The  fwo  sefs  of  subfopies 
are  merged  using  fhe  following  rules. 

1.  r  G  Qg{q)  n  Qui{q) 
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Search  terms 

Top  searches 

1  oooale  mao 

2  oooale 

3  map  Quest 

4  world  mao 

5  maos 

6  street  mao 

- 25 

7  london  map 

2* 

8  us  map 

—  20 

9  road  mao 

—  20 

10  europe  map 

Figure  2:  Information  from  Google’s  Insight  for  Search  for  query  “map”. 


We  just  use  it  as  a  subtopic  and  its  relativity  as  P{c\I  =  1,  q) 

2.  r  G  Qg{q)  -  Qw{q) 

We  consider  that  it  may  not  be  reasonable  and  drop  it. 

3.  r  e  Qw{q)  -  Qg{q) 

Since  most  of  the  related  queries  from  Wikipedia  are  reasonable,  they  should 
be  kept  as  subtopics.  However,  the  problem  is  that  we  cannot  estimate 
P{c\I  =  1,  q).  Here,  it  does  not  appear  in  Qg{q),  so  we  assume  that  there 
may  be  very  few  people  who  want  this  subtopic.  We  just  put  the  minimal 
relativity  (the  set  of  relativity  numbers  which  come  from  the  top  10  related 
queries  Qg{q)  )on  this  related  query. 

Note  that  besides  the  subtopics  obtained  using  the  above  method,  we  also  con¬ 
sider  the  original  query  q  as  one  of  the  subtopics  of  itself,  and  assign  it  the  maximal 
weight.  Then,  we  normalize  the  weights  of  subtopics  to  sum  to  1.  Such  weights 
are  used  as  the  estimated  value  for  P{c\I  =  l,q). 

4.2  Estimate  P{d\c,  /  =  1,  g) 

For  each  subtopic,  we  treat  it  as  a  query  to  retrieval  1000  documents  by  use  the 
standard  BM25  [9] .  Then,  the  scores  which  are  generated  by  BM25  are  normal¬ 
ized  to  sum  to  1.  We  use  these  normalized  the  BM25  scores  as  the  probability 
P{d\c,I  =  l,q). 
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>  median 

=median 

<median 

alpha-nDCG 

28 

13 

9 

lA 

27 

16 

7 

Table  1:  For  each  query,  we  compare  our  evaluation  scores  with  the  median  scores. 
>median  means  the  number  of  queries  where  our  score  is  better  than  the  median 
score.  =median  means  the  number  of  queries  where  our  score  is  equal  to  the  median 
scores.  <median  means  the  number  of  queries  where  our  score  is  worse  than  the 
median  score.  There  are  total  50  queries. 


For  each  document,  after  we  get  P{d\c,I  =  \,q),P{c\I  =  l,q),  we  com¬ 
bine  and  rerank  all  the  documents  which  are  retrieved  by  each  subtopic  by  using 
equation  4. 

4.3  TREC  Diversity  Results 

The  TREC  organizers  provide  two  performance  metrics.  The  first  is  alpha-nDCG 
which  are  defined  by  Clarke  el  al.[4]  In  evaluation,  alpha-nDCG@10  is  used  and 
fhe  parameler  alpha  =  0.5.  The  second  is  an  “infenl  aware”(IA)  version  of  precision 
as  defined  by  Agrawal  el  al.[l],  where  all  inlenls  are  given  equal  weigh!.  For  each 
query, we  compare  our  evalualion  scores  lo  fhe  median  scores  of  all  parlicipanfs. 
There  are  lolal  50  queries  in  fhe  fesl  dala.  Table  1  show  fhe  resulls  of  comparing 
our  scores  wilh  fhe  median  scores.  The  resulls  show  lhal  fhe  evalualion  resulls  are 
similar  in  Iwo  measurement  For  more  lhan  half  queries,  Ihe  evaluation  socres  of 
our  melhods  are  heller  lhan  Ihe  median,  for  more  lhan  ten  queries,  our  scores  are 
equal  lo  Ihe  median  scores  and  for  only  less  lhan  queries,  our  scores  are  worse  lhan 
Ihe  median  scores.  Overall,  Ihe  resull  show  lhal  our  melhod  is  effective  and  bul 
still  need  be  improved. 

5  Summary  and  Future  Work 

We  sludied  Ihe  problem  of  diversifying  search  resulls  for  ambiguous  web  queries. 
Underslanding  Ihe  sub  topic  of  Ihe  ambiguous  queries  and  user  inlenl  on  Ihese 
subtopics  are  very  imporlanl  for  Ihis  problem.  Our  melhod  maximizes  Ihe  satis¬ 
faction  of  average  users  by  ranking  documenls  relevanl  to  a  variety  of  usublopics 
considering  Ihe  probability  of  user  inlenl  given  lhal  query.  The  TREC  evaluations 
show  our  melhod  is  effective  on  Ihe  diversity  lask. 

Eurlher  work  is  still  needed.  The  firsl  problem  is  how  to  belter  estimate 
P{c\I  =  l,q)  and  P{d\c,I  =  l,q).  In  our  experiment  for  P{c\I  =  l,q),  we 
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only  borrow  the  information  from  Google  Insights  for  Seareh  and  Wikipedia.  In 
the  future,  we  plan  to  mine  query  log  and  doeument  eontent  to  obtain  potential 
subtopies.  Seeond,  we  did  not  deteet  duplieates.  Duplieate  detention  is  expeeted 
to  affeet  our  diversity  method.  In  the  future,  we  plan  to  deteet  near  duplieate  doe- 
uments  before  the  stage  of  diversilieation.  Finally,  we  will  eonsider  finer-grained 
granularity  of  diversity  based  on  enough  user  intent  data.  Multi-level  diversity 
model  ean  also  be  built. 
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